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Systems with two types of agents with a preference for heterophilous interaction produces net- 
works that are more or less close to bipartite. We propose two measures quantifying the notion of 
bipartivity. The two measures — one well-known and natural, but computationally intractable; one 
computationally less complex, but also less intuitive — are examined on model networks that contin- 
uously interpolates between bipartite graphs and graphs with many odd circuits. We find that the 
bipartivity measures increase as we tune the control parameters of the test networks to intuitively 
increase the bipartivity, and thus conclude that the measures are quite relevant. We also measure 
and discuss the values of our bipartivity measures for empirical social networks (constructed from 
professional collaborations, Internet communities and field surveys). Here we find, as expected, that 
networks arising from romantic online interaction have high, and professional collaboration networks 
have low bipartivity values. In some other cases, probably due to low average degree of the network, 
the bipartivity measures cannot distinguish between romantic and friendship oriented interaction. 

PACS numbers: 89.75.Fb, 89.75.Hc, 05.50.+q 
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I. INTRODUCTION 

Any system, natural or man-made, consisting of en- 
tities that interact pairwise can be described in terms 
of a network. Networks in the real life often contain 
some degree of randomness, and has also some structure 
arising from the strategies or laws the entities follow to 
make new contacts. Such networks — that can only be de- 
scribed as having both randomness and structure — are 
called complex networks and has lately received much 
attention in the physicist community Among the 

most important developments in this recent surge of ac- 
tivity in network research is arguably the categorization 
and quantification of static network structures such as 
clustering |31, degree distribution |4], assortative mixing 
coefficient |5], grid coefficient |6], etc. A network with 
no circuit of odd length is called bipartite. Many systems 
are naturally modeled as bipartite networks: Biochem- 
ical networks can be described by vertices representing 
chemical substances separated by vertices representing 
chemical reactions 1 7]. As another example, we have the 
so called "two-mode" representation of affiliation net- 
works where one kind of vertices represents e.g. organi- 
zations and the other type represents individual actors, 
and the edges indicates to which organizations an actor 
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belongs. But there are also networks that are not neces- 
sarily bipartite, but closer to bipartite than what can be 
expected from a completely random network. Examples 
of such networks are those that are formed by two types 
of agents with a preference for heterophilous interaction 
(human sexual contacts f§, and human romance or 
partnership networks |10] being two cases). In many 
cases one knows the type of the individual vertices (the 
gender of the actors in the examples above) 1 11], but in 
other cases such information might be lacking (the data 
studied in Ref. |12] for a concrete example). Neverthe- 
less, the 'bipartivity' — ^how far away from being bipartite 
a graph is — is a measurable structure; and therefore, we 
believe, deserves attention. 

How can we measure bipartivity? The idea we use in 
this paper is the following: We suppose that all agents 
of one type tried their best in forming a connection to an 
agent of the other type. Then we measure to what extent 
this assumption fail. We can assign a label Oz, e (-1, -i-l) 
to each vertex v and check for the maximal fraction of 
edges between vertices of different sign. This fraction 
will be equal to or higher than the actual fraction of edges 
between vertices of different type. But, at least for strong 
heterophilous preference in the network formation, the 
difference should be small. For weak heterophilous pref- 
erence this approach will likely fail to produce a correct 
classification of the individual vertices. Still, the num- 
ber of even circuits should be larger than in a network 
created under the same circumstances but with no het- 
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erophilous preference; and this will (as we will see) give 
a lower value of such a bipartivity measure. So even if 
we cannot reproduce the correct fraction of vertices of 
different type, we have a measure that is a monotonous 
function of the strength of the heterophilous preference. 
It is convenient (at least for people familiar with statis- 
tical mechanics) to phrase a problem like this in terms 
of the antiferromagnetic Ising model. Our bipartivity 
measure — the maximal fraction of edges between ver- 
tices of different sign — is directly related to the ground 
state energy of the antiferromagnetic Ising model (the 
relation is given in Sect. Ill ATI . Throughout the paper 
we will often use the terminology of such spin systems, 
such as the antiferromagnetic Ising model. For example 
we talk of an edge between two vertices of the same tag 
as a 'frustrated' edge. 



The spin system analogy to combinatorial optimiza- 
tion problems such as the one we are facing — to find min- 
imal fraction of frustrated edges — is nothing new. With 
this approach the fraction of frustrated edges defines a 
cost function corresponding to the energy of the spin sys- 
tem. The two most studied problems in this area are the 
p-coloring problem and the graph bisection problem. In 
the p-coloring problem the question is whether or not the 
vertices of a graph can be assigned one of p colors in such 
a way that no edge goes between two vertices of the same 
color. This problem is solvable in linear time for p = 2, 
but NP-complete (i.e. in the general case not calculable 
in pol5rnomial time |13]) for p > 2. The graph bisection 
problem (also NP-complete) is to partition the vertex-set 
into two sets of equal size such that the number of edges 
between the two sets is minimized |14, 15, 16]. Both 
these problems can, just as ours, be phrased in terms 
of spin-models with antiferromagnetic interaction. Our 
minimization problem is a little bit different from the 
bisection problem in that the two sections can have arbi- 
trary sizes. However, as in the bisection and p-coloring 
problems we are also faced with an NP-complete opti- 
mization problem. (Our aim — to find the ground state 
energy of antiferromagnetic Ising model can be mapped 
to a min-flow max-cut problem lIT^ which is NP-hard 
on general networks | Igj].) 



As the spin models of statistical physics are familiar 
to statistical physicists, it is not surprising that topics 
like the Ising and XY models on various model net- 
works iigl I20I1 have received much attention in physi- 
cists' network literature. The motivation for such stud- 
ies, as models of real-world systems, is that they can 
capture some features of opinion formation or similar 
social processes |21]. The present work can also be de- 
scribed as a study of a spin model on a complex network, 
but unlike the above mentioned studies, the spin model 
is used as a tool to measure a static network structure. 



II. THE MEASURES 

In the following sections we will go through the two 
bipartivity measures. We state the definitions, dissect 
the algorithms and give analytic discussions about the 
limit properties. 

We represent a undirected network by G - ( V, E) and 
a directed network by Gdir = (V', A), where V is the set of 
vertices, E is a set of edges (or undirected pairs of ver- 
tices), and A is a set of arcs (or ordered pairs of vertices). 
A path of length I is a sequence of vertices ci, • • • , c/ such 
that {vi,Vi+i) e E (or (c„c,+i) e A for directed graphs); a 
circuit is a path where the first and last vertex are iden- 
tical. In an elementary path, or circuit, no vertex appears 
twice (except the first and last in case of circuits). In 
the present paper we will only talk about elementary 
paths and circuits — so, for brevity we omit the word 'el- 
ementary.' Throughout the paper, when necessary, we 
let sub- or superscript 'dir' denote directed versions of 
quantities. In many cases the generalization from undi- 
rected to directed networks is straightforward; in these 
cases we will pursue the discussion in the framework of 
undirected networks. 



A. The measure foi 

1. Definition 

The first measure we consider is simply the fraction 
of unfrustrated edges in the ground state of the anti- 
ferromagnetic Ising model on the network. In terms of 
the antiferromagnetic Ising model the quantity can be 
written as 
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where is the number of frustrated edges in the 
ground state (the usual cost function in the two-coloring 
problem). Eq is the ground state energy 



En = min , H 



(2) 



where H is the Hamiltonian of the antiferromagnetic 
Ising model: 



{v,zv)eE 



(3a) 
(3b) 



The directed quantity is obtained by substituting H by 
H(jir in Eqs. O and ^5), and edges by arcs in the above 
discussion. The topology of the energy landscape is de- 
termined by the underlying network, and can in general 
be very complex 1.221 . 



2. Limit properties 

The bi measure takes values in the interval (1/2,1]. 
The upper bound is attained for bipartite graphs. It is 
easy to see that bi cannot be lower than 1/2: Consider a 
ground state configuration for which the opposite is true. 
Then there must be at least one vertex with more than 
half of its edges frustrated. Flipping this spin would 
reduce the energy, which contradicts the fact that the 
system is in the ground state. We do not know if this 
bound is realized for any finite graphs, but bi = 1/2 
is the limit value for bi for a fully connected graph as 
N ^ oo: Partition the fully connected graph Kn of N 
vertices (and M - N{N - l)/2 edges) into one set of N' 
and one set ofN — N' vertices and assign opposite spins 
to the elements of these sets. The number of frustrated 
edges is precisely the number of edges within each set 
which is: 
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N'{N'-1) {N-N'){N-1-N') 
2 2 
= M - N'{N - N') . 



(4) 



Thus the minimum number of frustrated edges is exactly 
N^/4 - N/2 for N' = N/2, and the fraction of imfrustrated 
edges is 



1 1 

bi = - — — — as N 
2 - 2/N 2 



(5) 



The above arguments can be generalized to directed net- 
works straightforwardly. 



3. Minimization by exchange Monte Carlo 

The complexity of the "energy landscape" of the anti- 
ferromagnetic Ising model on an arbitrary network is dif- 
ficult to judge a priori. There are indications that no nat- 
ural network would be too hard for a regular simulated 
annealing approach |14, 23]. To be on safer ground, we 
use a Monte Carlo scheme that is evidently very efficient 
to sweep even an extremely 'rugged' energy landscape 
without getting stuck in local minima — the so called ex- 
change Monte Carlo (XMC) |24]. The idea of exchange 
Monte Carlo is to run standard Metropolis Monte Carlo 
for Nj replicas of the system, each at a specific temper- 
ature. Then from time to time two replicas at adjacent 
temperatures are compared, and with a probability 
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1 if A < 
otherwise 
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and E is the energy of the configuration at temperature T 
(similarly for T' and E'), and T < T . the two replicas are 




FIG. 1: Some graphs in the discussion of the 1)2 quantity. The 
coloring of the vertices minimizes Mfr. Black edges indicate 
frustration, (a) An almost bipartite graph with many triangles. 

(b) A graph where all odd-circuits contribute to the frustration. 

(c) A graph were only the shortest circuits contribute to the 
frustration. 



swapped between the temperatures. This condition is 
designed so that the Monte Carlo scheme preserves the 
Boltzmann distribution. This is not decisive for us who 
are looking for the ground state energy, rather that per- 
forming a proper sampling of the configuration space, 
but anjrway kept in our measurements. Besides just run- 
ning the XMC scheme we also periodically quench the 
system, i.e. we sweep through all vertices of the network 
consecutively and flip spins that lower the energy. The 
sweeps are continued imtil a sweep with no spin-flips 
has occurred. For later reference we introduce the nota- 
tions favg for the total number of MC sweeps — ^we refer 
to the number of MC sweeps as 'time' — iquench for the 
time between each quench, tgxch for the time between ex- 
change trials, imeasure for the time between measurement 
sweeps (where the energy is sampled). 

For the exchange Monte Carlo scheme to efficiently 
sample the configuration space all replicas needs to tour 
the whole range of temperatures in a reasonably short 
time. At the same time one would not like the exchange 
trials, at any neighboring temperatures, to be constantly 
affirmative — then the separation of the two tempera- 
tures would be of no use. We follow Ref . i23l and choose 
the temperature set 



low 



Tu. ,\('-l)/(Nr-l) 
^ high ' 



(8) 



where 1 < i < Nt enumerates the replicas. Tio^ is the 
lowest and Thigh represent the highest temperatures re- 
spectively. To find the actual para meter values (which 
will be stated in Sees. II V Al and llVBl one has to check that 
the replicas travels throughout the temperature range 
with reasonable exchange ratios for all temperature gaps. 



B. The measure ^2 

Apart from finding an approximative value of bi, one 
can also define a quantity that is exactly solvable in poly- 
nomial time. Our intention is in the first hand not to 
make a heuristic algorithm for calculating b\, but rather a 
quantity that captures the same structure, i.e. that grows 
monotonously with bi . 

That a graph contains no odd circuits is the defining 
property of bipartiteness 1.25.1 . It is thus natural that we 
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base a bipartivity measure on an odd-circuit count in 
some way. Unfortunately, defining a quantity in this 
way becomes a little bit more complicated than at first 
expected. One complication is that a graph can be very 
close to bipartite and still contain many odd-circuits (see 
Fig.^a)). A way of dealing with this problem is to mark 
as few edges as possible such that each odd circuit con- 
tains at least one marked edge. In many cases a marked 
edge will correspond to a frustrated edge of the ground 
state of the antiferromagnetic Ising model. In Fig. QJa) 
only the upper, horizontal edge needs to be marked. An- 
other problem one faces is how to deal with odd circuits 
of different length — in a network with very few odd cir- 
cuits a circuit of, say, length seven would contribute as 
much to the global frustration of the network as a trian- 
gle (a subgraph of three adjacent vertices — see Fig.^b)). 
But in many real networks the total length of the odd 
circuits is ver y long (this is true for all networks we mea- 
sure, see Sect. IIII B) . much larger than M, in these cases 
the short circuits are in general the most important in de- 
termining the ground state configuration. For example, 
in Fig.njc) M = 23, and while we have 11 triangles, sum- 
ming the lengths of all odd circuits gives 218 (33 from the 
1 1 triangles, 45 from the nine circuits of length five, and 
so on). However, only the triangles contributes to the 
ground state configuration in the sense that each trian- 
gle has the same configuration as the ground state of an 
isolated triangle, while all odd circuits of length larger 
than four (e.g. the periphery) has not the best coloring 
for a circulant of that length. To deal with this we need 
to weight short circuits higher than long. We will do 
this by assigning a cut-off length and neglect all circuits 
exceeding this length. 

1. Definition 

Now, we make an algorithm of the above ideas as 
follows: Let C„ be the set of odd circuits of length < n. 
Let E(C„) be the accumulated length of the circuits in 
C„ (so, for example ^(Ca) = 3 in Fig. Wih)). Now we 
assign the cut-off 3M to 2I(C„), and let h be the smallest 
n such that E(C„) > 3M. Next we turn to the marking 
procedure sketched above. Let v(e) denote the number 
of circuits in Cf, passing through the edge e. Clearly 
edges of high v are likely to be frustrated in the ground 
state (viz. Fig. GJa)). We now estimate Mfr roughly as 
the number of edges that has to be marked so that each 
odd circuit of length < « is marked at least once. To be 
precise we perform the following algorithm: 

1. Start with C = C,-,. 

2. Sort the edges in order of v. 

3. Repeat the following while C i= 0: 

(a) Mark the edge e with highest v. 

(b) Remove all circuits in C containing e. 



(c) Recalculate v for each edge. 

Then the number of iterations m' is the assessment of 
Mfr, and we define our bipartivity measure as 



This algorithm is not an attempt to actually identify 
the frustrated edges, rather it is supposed to give a high 
Mfr for a system with high (total) geometric frustration, 
and vice versa: Firstly, it does not necessarily find the 
minimal number of edges needed to be marked for all 
odd circuits of length less than h to contain a marked 
edge. But we expect this steepest descent optimization 
to come close in most cases. Secondly, an odd circuit can 
in reality only have an odd number of frustrated edges, 
but in the algorithm there is no such restriction on the 
number of marked edges. 

In case there are more than one edge with the highest 
V (in step |3a] of the algorithm) we choose the edge to 
mark at random. The variance between different ran- 
dom seeds turns out to be negligible in most cases. We 
will run the algorithm for different seeds to choose the 
highest b2 value, and get an idea about the error in b2 
from the selection of edge to mark. An alternative (and 
more ambitious) approach would be to iterate the whole 
calculation until the highest ^2 has reappeared a fixed 
number of times (cf. |26]). 

If we assume a sparse network (i.e. N oc M) the running 
time of the algorithm above is O(M^). To see this we first 
note that there can be at most 0(M) iterations at step|2] To 
find the edge with highest v (in stepl3at we do not need to 
sort all edges more than once (as done in step|5J. Instead 
we can find this out while recalculating v (in step Ocl. 
Removing all circuits containing e (as in step 13b 1 can 
be done in time bounded by the total length of circuits 
containing e, which cannot be larger than 3M. Step |3c| 
also needs to go through all circuits pas sing e and thus 
needs the same running time as step BbI To sum this up 
the running time for this section of the algorithm is of 
order N^. 



2. Limit properties 

In the N ^ oo limit the ^2 measure lies in almost the 
same interval as bi. The upper limit ^2 = 1 is attained 
if and only if the graph is bipartite. (If the graph is 
bipartite Cf, is empty and v{a) - for all a, so m' - and 
t'2 = 1- If there exists odd circuits m' > 0, so < 1-) ^2 
cannot be as low as (if one marks all edges, every circuit 
must be marked). Since the fc2-definition is inspired by 
the ground-state configuration of the antiferromagnetic 
Ising model, we expect a similar lower bound to ^2 as to 
b\ . In AppendixlAlwe argue that the lower bound on the 
b2, as for the bi measure, is 1 /2 in the N — > oo limit. 



3. The complete algorithm 

So far we have overlooked the central part in calcu- 
lating the b2 measure — ^namely to find odd circuits. To 
do this we use a modified version of Johnson's algo- 
rithm 1 27]. In principle Johnson's algorithm is a depth 
first search where, to avoid futile searching, some ver- 
tices are blocked while stepping down the search tree. 
The running time for Johnson's algorithm is 0(M(C + 1)) 
(if M > N) where C is the total number of circuits. Now 
C can grow fast with N which would make the finding of 
all odd circuits a quite intractable computation. In many 
cases the cut-off of the circuit length, that we introduced 
above to give less priority to long circuits, saves us by 
setting a limit on the search depth. To implement this 
we let n be the current upper bound on circuit length (or 
search depth), and £ be the current sum of odd circuits 
< h. As soon as £ > M we iteratively decrease « by 2 
and recalculate £ until E < 3. If £ < M when the search 
is over we rerun the procedure where we use n + 2as our 
new (fixed) n |28]. When the search is over we assign 
h the value n. For dense bipartite graphs the algorithm 
is intractable. In the worst case, the full bipartite graph, 
Kn/2,n/2, there are 



C{Kn/2,N/2) ^ Xj 2fc 
k=A 



(N/2)! 



(N/2 - k/iy. 



(10) 



circuits (where the sum is over even values of k) ll29il 
giving a running time of 0{N^C{Kn/2,n/2))- One can of 
course decide whether or not a graph is bipartite in linear 
time, but non-bipartite cases of similar complexity are 
easily constructed (by, e.g., adding an isolated triangle). 
In practice these worst cases are, probably, very rare — a, 
relatively speaking, very low density of odd circuits is 
needed to get a small n — even in the real-world network 
with highest bipartivity we have h - 3. In this case 
{h = 3) all odd circuits are found in O(M^) time. 

Now we turn to a more complete description of the 
algorithm. Johnson's algorithm takes the 'least' (smallest 
in some enumeration) vertex in a strongly connected 
subgraph as its starting point. To find strongly connected 
components we use the algorithm in Ref. I3(J1 . To sum 
up, the algorithm reads: 

1. Mark all vertices as unchecked. 

2. While there are unchecked vertices, iterate the fol- 
lowing: 

(a) Pick an unchecked vertex v. 

(b) Find the largest strongly connected compo- 
nent A,, containing v. 

(c) Set A := and repeat the following steps as 
long as A 9^ 0: 

i. Pick the least vertex u of A. 

ii. Call a subroutine implementing the mod- 
ified Johnson's algorithm. Recalculate fi 




FIG. 2: Construction of the test networks, (a) shows the gen- 
eralization of the ER model (Model 1). (b) shows interpolation 
between quadratic and triangular lattices (Model 2). (c) shows 
the model with predominantly longer circuits (Model 3). All 
models are bipartite for ri 2,3 = 1. Additional edges creates 
odd circuits (frustration) for lower ri 2,3-values. The black lines 
illustrates these additional edges. The white and non-white 
vertices symbolize a partition giving foi = 1 in the ri_2,3 = 1 
case (it is not meant to represent the optimal coloring when 
''1,2,3 < !)• 



and add C,-, to a list C. 
longer than n from C. 
Delete u from A. 



Delete circuits 



3. Set h := n. 



4. Run the algorithm described above (in Sect. IIIB H 
to mark edges and calculate h2- 

In all cases, step |3 sets the limit on running time. As 
mentioned, in most application we expect the running 
time of step|5]to be O(M^) (similarly to that of step|4j. 



III. THE NETWORKS 
A. Test networks with tunable bipartivity 

To test and compare the hi and ^2 quantities we con- 
struct three types of test networks where the bipartivity 
can be tuned by model parameters. The principle be- 
hind all models is to start from bipartite networks and 
add lesser or greater number of edges within a partition 
to create odd circuits. 

One type (Model 1) is a quite straightforward general- 
ization of the Erdos-Renyi (ER) model |31]: We partition 
the vertices in two disjoint sets of sizes N and N — N. 
Then we add riM edges randomly between vertices of 
the different sets, and (1 - ri)M edges regardless of what 
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set the vertices belongs to (see Fig.|3a)). In this way we 
interpret ri as the strength of the heterophilous prefer- 
ence in a model where bipartivity is the only structural 
bias. The choice of vertex pairs is done with random- 
ness, the only restriction being that loops and multiple 
edges are not allowed. If ri =0 the model reduces to 
the ER model, while for ri = 1 the networks are bi- 
partite (cf. Ref. 1 32]). This model is probably the most 
random (i.e. having least structural biases) model with 
tunable bipartivity. The disadvantage is that the expec- 
tation values of and ^2 are hard to calculate (even in 
the frustrated limit ri =0). 

Model 2 interpolates between two-dimensional 
square- and triangular lattices. We start, for r2 = 0, with a 
triangular grid with periodic boundary condition. Let L, 
the linear dimension of the system (i.e. N = L^), be even. 
For a non-zero parameter value we (by uniform random- 
ness) delete rjL^ 'diagonal' edges creating frustration as 
illustrated in Fig.|5Ib). To be more precise, if we index 
the vertices as {ix, iy), 1 < ix, iy < L; then the edges are 
Wx, iy), (;.T + 1, iy)] and [{ix, iy), {ix, Zy+1)] (giving the square 
grid) plus rjL^ edges of the form [(/,;, iy+l), (z'.y+1/ iy)] cho- 
sen by uniform randomness (addition is modulo L). This 
model has a high degree of short circuits. The extremes 
r2 - and rj = 1 represent two generic lattice types. The 
symmetries of the regular networks simplify the calcula- 
tions of e.g. limit properties for the bipartivity measures. 
If ^2 = 1 the system is bipartite (note that L has to be 
even for this to hold) so bi^2 = 1- When r2 = we have 
fci = ^2 = 2/3: For the lower limit of the bi quantity, see 
Ref. 1 33]. For the lower limit ^2 we note that ^(Cs) = 6N 
(since each vertex can be associated with two triangles). 
This gives h = 3 and v - 2 for all edges. Now it is enough 
to mark N edges (e.g. all [{ix, iy), {ix + 1, iy)] edges). In this 
case we note that each edge will have v - 2 when it is 
marked, which means that the marking sequence is opti- 
mal and that the number of iteration cannot be less with 
another choice of edges to mark. 80^2 ~ 1-N/3N - 2/3. 
The major disadvantage with Model 2 is that the average 
degree is a function of r2 (M = (3 - r2)L^)- This change 
in the average degree can make it harder to separate ef- 
fects of the shift in bipartivity from the shift in average 
degree. 

In both model 1 and (even more) model 2 trian- 
gles will dominate the set of odd circuits. To test 
networks with predominantly longer circuits we con- 
struct a Model 3 as follows (see Fig. |3c)): We make 
two circulants of size N/2 with the vertices {v'^ ■ ■ ■ , v'^^^) 

and edges {(u'j,z;^), • • • ,(z;j^^2-i'^N/2)'(^N/2'^i^'' ' ^ 
Then we add Mtrans transverse edges between the circu- 
lants. Mtrans/2 of thesc cdges are placed out separated by 
equal distance N/Mtrans separating the double circulants 
into Mtrans/2 'scctors.' Then we fill up each sector with 
another transverse edge: With probability we add an 
{vj,v^) edge (such that {vj,v^) is none of the previously 
added transverse edges), otherwise we add a {v^, + 1) 
edge (addition modulo N/2). We note, to a first ap- 



proximation, that if ra = marking (in the process of 
calculating ^2) one edge between every transverse edge 
on one of the circulants is needed to mark the shortest 
odd circuits. This will make ^2 £ 0{1 - Mtrans/^)- 



B. Real-world networks 

Physicists' networks studies has, in the spirit of sta- 
tistical mechanics, emphasized the properties remaining 
when the system grows beyond any limit. Bipartivity, 
as discussed above, is well defined for all system sizes. 
Still it is a quantity that can potentially suffer from finite- 
size effects (from the fact that not all real neighbors of 
all actors in a empirically constructed social network are 
a part of the graph) and is therefore preferably mea- 
sured for large networks. Now the problem is to find 
data for large-scale real-world networks of social interac- 
tion. In general two methods has been successful for this 
purpose — one either uses professional collaborations of 
some sort or data from interaction over the Internet (ei- 
ther in Internet communities or through email 
exchange L35] . 



1. Professional collaboration networks 

In the professional collaboration networks we study 
the vertices are professionals of some field — networks of 
scientists and company directors are considered in this 
papers, the movie-actor network is another frequently 
studied example; the edges represent that two actors has 
been involved in the same professional collaboration. 
This is some-times referred to as a "one-mode" repre- 
sentation of an affiliation network (as opposed to the 
bipartite two-mode representation discussed in Sect.HJ. 

Professional collaboration networks are no doubt in- 
teresting in their own right as accounts for the interaction 
dynamics of the respective fields. Assuming that the for- 
mation of professional ties follow similar principles as 
general human interaction, we can use professional col- 
laboration networks to draw conclusions about the struc- 
ture of more general social networks. However, at one 
point (at least) professional collaboration differs from 
general social interactions: A collaboration tie does not 
necessarily imply a strong personal acquaintance, but 
in these networks each collaboration constitutes a fully 
connected cluster This leads to higher fraction of short 
circuits than, say, a friendship network. 

One of the professional collaboration network we use 
is of scientists who has uploaded manuscripts to the 
preprint repository arxiv.org. Two scientists are linked 
if their name (identified by surname and initials) appear 
together on at least one preprint. A detailed description 
of this network can be found in Ref. ]36]. In the other 
professional collaboration network the vertices repre- 
sent company directors from the Fortune top 1000 list of 
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companies in USA the year 2001. An edge (collabora- 
tion) in this network means that two directors are sitting 
in board of the same company. A detailed description 
of this network can be found in Ref |37]. Sizes of the 
networks can be seen in Table|l] 



2. Online interaction networks 

In online interaction networks, the vertices are users 
of Internet communities and an arc (A,B) is added if A 
contacts B, or if A adds B to his/her list of friends ll2Ll33l. 
Another kind of online interaction networks are email 
networks 1 35], where an arc can be assigned if an email is 
sent, or if a person adds another to his/her address book. 
Just as for professional collaboration networks, one can 
argue that online interaction networks are representa- 
tive as general social networks. One can assume that 
new contacts are formed through preference-matching 
searches to a larger extent, and introduction by mutual 
friends to a lesser extent, than in general friendship net- 
works. Since the introduction of mutual friends to each 
other is believed to be the major cause of high clustering 
(large density of triangles, or, large transitivity) [38] one 
can expect a lower clustering in networks of online in- 
teraction (still the clustering in these network seems to 
be finite in the N ^ oo limit ]12]). 

The specific online interaction networks we con- 
sider are constructed from the Internet communities 
nioki.com and pussokram.com. The nioki.com data is 
described in Ref. ]34]. In this data an arc (A,B) means 
that B is listed as a friend by A, which allows A to see if 
B is online and send instant messages to B. In the pus- 
sokram.com data the arcs correspond to communication 
between the users. There are four different types of com- 
munication in this specific network (all described in de- 
tail in Ref. ] 12]). We use the networks obtained from two 
types of interaction ('messages' — like ordinary emails 
within the community, and "guest book" — ^where one 
user contacts another by writing in his/her guest book), 
and the network of any of the four types. Network sizes 
can be found at Table|l] 

Another large difference between the pussokram.com 
and nioki.com data is that the former community has 
a very pronounced romantic profile, encouraging flirts 
and romantic correspondence, nioki.com has also a 
search engine to "trouve I'amour" (find love), but that is 
all. 

Apart from the two Internet communities, we study 
another type of online interaction network based on 
the flow email. For this network all in- and out-going 
email traffic to a server was logged for around three 
months ]35]. The server handles undergraduate stu- 
dents' email accounts at Kiel University, Germany. Thus 
there are two categories of vertices — internal vertices, 
whose activity is accurately mapped; and external ver- 
tices, that only have edges leading to internal vertices. 
In this study we restrict ourselves to the network of 



internal-internal contacts. The reason we do not include 
external contacts is that we would miss the (probably 
many) circuits containing external-external edges which 
would bias the bipartivity. 

3. Network from interview and field survey 

Apart from the above networks, all obtained from 
databases, we also measure the bipartivity of two net- 
works obtained from interview and field surveys. The 
first data set is gathered by observations of interac- 
tion between members of a university karate club L3S]- 
We also study the network of acquaintance ties in a 
prison ]40]. The outgoing arcs from A corresponds to 
prisoners listed by A in response to the question: "What 
fellows on the tier are you closest friends with?" Due to 
their acquisition methods these kind of real-world net- 
works has to be rather small. This can, as mentioned, 
result in finite size effects. On the other hand they, most 
likely, more truly reflect the structure of real acquain- 
tance networks. 



IV. RESULTS 

In this section we present the results of the test net- 
works and the measurement for the real-world social 
networks. 



A. Test networks 

As expected, both hi and ^2 are monotonously in- 
creasing as functions of the ri, r2 and rj, parameters of 
(almost ]41] all our test network (see Fig. |3). This is 
encouraging and suggests that both h\ and ^2 are quite 
relevant measures of bipartivity. 

The Model 1 measurements shown in Fig. |3Ia) are 
made with the model parameters N = 2N =100 and 
M = 800. We have checked many other sizes too, but all 
have the characteristic appearance of Fig. Oa) — a linear 
increase of b\ and b2 for larger r\ and an flatter slope for ri 
close to zero. This shape is expected from the discussion 
in Sect. H] — in networks where a heterophilous prefer- 
ence is the only structure-inducing force, only the strong 
preference limit gives a strong measurable effect: Close 
to the ER limit ri ^ 0, the original two partitions will 
not be identified correctly, only when the different parti- 
tion (to a large extent) have different sign the bipartivity 
will be proportional to the strength of the heterophilous 
preference. 

As seen in Fig. |3b) Model 2 shows an almost linear 
functional form of bi,2(7'2)- In this case, triangles domi- 
nate the odd circuits even at small values of r2- Tuning 
r2 will give a proportional increase of the number of 
triangles. Thus a linear r2 dependence of ^2 would be 
expected. 
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FIG. 3: The bipartivity measures versus the model parameters 
of the two models defined in Section lTll Al (a) shows the result 
for Model 1, (b) shows the result for Model 2, and (c) shows 
the result for Model 3. All error bars would be smaller than 
the symbol size. The monotonous growth of the bipartivity 
measures shows that the measures behaves expectedly. 



Also Model 3 has linear 2 vs. r^, curves. The model 
parameters used are N = 100 and Mtrans = 10- As men- 
tioned in Section lTll Al we expect b2 ~ Mtrans /^^ for - 0, 
which is confirmed m Fig.|3Ic). 

The measurements for both bi and b2 are averaged 
over 100 network realizations. The XMC scheme for the 
bi quantity is ran at 24 temperatures m parallel, between 
temperatures 0.01 and 2. Other network parameters are 

^avg — 4 X 10^, fmeasure ~ 4, fquench ~ 20 and texch — 1000. 

These are more modest parameter values than we will 
use for the real-world networks, but the test networks 
are also much smaller, and since the distribution of bi 
and &2 are (likely) symmetric, the network average helps 
to reduce the error. 



B. Real-world social networks 



three vertices and two edges. Such subnetworks does 
not affect the clustering coefficient but tends to decrease 
the bipartivity measures |43]. 

The collaboration networks consist of a number of 
fully connected clusters (corresponding to a specific col- 
laboration) that are interconnected. It is thus natural 
that we see low bipartivity and a high density of short 
circuits. The lower bipartivity values for the company 
director network can be explained by smaller average 
size of such fully-connected clusters: The average num- 
ber of vertices per collaboration is 9.5 for the corporate 
director network and 2.5 for the scientific collaboration 
data|34,JZJ- 

The two small networks constructed from field sur- 
veys (the "karate club" and "prison" network of TableHl 
discussed in Section lTll B 31 show mid-range bipartivities 
and relative high values of C and D. From the above dis- 
cussion we can expect that the bipartivity of large, real, 
acquaintance networks is somewhere between those of 
the collaboration networks and the Internet community 
networks (because they probably have higher clustering 
than Internet community networks, and lower number 
of fully connected clusters than the collaboration net- 
works). Encouraging enough, this is exactly what we 
see in Table |T] Of course, the very small systems sizes 
might affect the results, but that the bipartivity measures 
of real-world acquaintance measures would be close to 
either the upper or lower limits seems hard to believe. 

We conclude this section by a note on the parameters 
for the XMC optimization. The measurement of foi for all 
real-world network (except the nioki.com data where we 
study the convergence more carefully) are done just once 
with the following simulation parameters, Nj - 24 (with 
temperatures from 0.002 to 5) tavg = 1 x 10^, imeasure = 16, 
f quench = 40 and ie«h = 2 X 10^. 



Now we turn to the result for the bipartivity measures 
of real-world networks. The values are presented in Ta- 
ble|I] For comparison we also give values for the cluster- 
ing coefficient (density of triangles) C and the density of 
squares D in both directed and undirected versions f43]. 
Undirected networks are constructed by taking the re- 
flexive closure. At first glance at the table we arrive at 
the pleasing conclusion that the bipartivity for the pus- 
sokram.com networks is very high (as expected from a 
network of romantic interaction of mostly heterosexu- 
als). But disappointingly, the bipartivity measures show 
similarly high values for the nioki.com and email net- 
works. This can be explained by the fact that nioki.com, 
just like the pussokram.com, data has very low C and 
D values, and presumably very few circuits at all. Now 
branches (subgraphs without circuits that can be isolated 
by cutting one edge) does not give a positive contribution 
to either bi or b2, no matter of the gender of the agents. 
The email network do have a high clustering, but still 
rather high bipartivity. The reason is that the email net- 
work is rather heavily fragmented and contains many 
isolated subnetworks of two vertices and one edge, and 



V. SUMMARY AND DISCUSSION 

This paper concerns the quantification of the network 
structure 'bipartivity' — how close to bipartite a given 
graph is. We propose two measures for this quantity. 
One quantity bi based on the optimal two-coloring of 
the network — or, equivalently, the ground state of the 
antiferromagnetic Ising model on the network. The ex- 
act value of this quantity (that has been used in different 
roles elsewhere) is NP-complete and thus in general not 
feasible to calculate exactly. Instead we seek an approx- 
imate solution by a simulated annealing approach. The 
simulated annealing is based on the exchange Monte 
Carlo scheme. We argue that this unorthodox minimiza- 
tion method helps us avoid local minima of the energy 
landscape of the antiferromagnetic Ising model. Fur- 
thermore we develop a measure ^2 based on the count 
of odd circuits that, for almost all networks, is calculable 
in polynomial time. 

We propose three different random graph test mod- 
els where one can interpolate between arguably non- 
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TABLE I: Sizes, clustering coefficients and bipartivity measures bi and ^2 for real-world social networks. 



network 


N 




M 




c 




D 










all rr^TrHji ptc 




174 662 


lis (nS4 


01 9 

U.U-LZ. 






n ni 7 


\j.ooy 


u.ouu 


n Q48 


\j.y£.o 


messages 


20 691 


73 346 


52 435 


0.0052 


0.0061 


0.0081 


0.0061 


0.897 


0.892 


0.984 


0.964 


guestbook 


21 545 


76 257 


55 076 


0.014 


0.014 


0.015 


0.021 


0.863 


0.889 


0.943 


0.965 


nioki.com 


50 259 


405 742 


239 452 


0.0076 


0.0065 


0.016 


0.013 


0.842 


0.855 


0.956 


0.975 


emails 


637 


554 


443 


0.11 


0.16 


0.071 


0.14 


0.944 


0.944 


0.971 


0.941 


arxiv.org 


52 909 


X 


490 600 


X 


0.45 


X 


0.35 


X 


0.630 


X 


0.623 


directors 


7 475 


X 


48 899 


X 


0.21 


X 


0.37 


X 


0.549 


X 


0.507 


karate club 


34 


X 


78 


X 


0.26 


X 


0.26 


X 


0.782 


X 


0.782 


prison 


64 


182 


85 


0.19 


0.31 


0.089 


0.14 


0.786 


0.878 


0.918 


0.847 



bipartite and bipartite graphs by tuning a control pa- 
rameter. Both our bipartivity measures are shown to in- 
crease monotonically with tuning the control parameters 
towards the bipartite extreme. From this we conclude 
that the bipartivity measures really quantify the notion 
of bipartivity. 

By considering example networks we infer that bi- 
partivity is a structure that cannot be measured by cur- 
rently popular structural measures, such as the cluster- 
ing coefficient. At the same time any sensible quantifica- 
tion of bipartivity probably has to have a positive corre- 
lation with the clustering coefficient for most networks 
(with exceptions for exotic cases like Fig. E!^)) — so, in 
that case bipartivity and clustering is not independent. 

We measure h\ and h2 of a number of real-world net- 
works, constructed from online interaction, professional 
collaborations, and field surveys. As expected, we see 
high bipartivity values for data from the Internet com- 
munity pussokram.com, where romantic contacts are 
encouraged, and hence a high degree of heterophilous 
interaction expected. We also see the expected low bi- 
partivity values for the professional collaboration and 
empirical acquaintance networks we study. Disappoint- 
ingly we cannot use our bipartivity measures to dis- 
tinguish between the networks driven by romantic or 
friendship (or professional) contacts. To do this other 
structures and the network sizes has to be taken into 
account, in a more elaborate analysis (that is out of the 
scope of this study). 

So far our examples of networks with high bipartiv- 
ity has been romantic networks and networks of sexual 
contacts. Network-based studies of sexually transmitted 
diseases |9] is a potentially interesting area for bipartiv- 
ity measures, as the transmission rates for homosexual 
and heterosexual contacts differ |44]. Apart from ro- 
mantic and sexual networks, there are other areas where 
the bipartivity measure may prove useful: One can con- 
sider a trade network where some agents are more or 
less pronounced sellers and others are primarily buyers 
(cf. Ref. |45]), such networks would not have a neutral 
bipartivity. Another application is for the 'genealogical' 
network of a disease outbreak: Some contagious dis- 
eases have a relatively stable duration between when an 
individual is infected and when he or she becomes infec- 
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FIG. 4: Marking of edges (in matrix representation) while cal- 
culating the hi quantity for a fully connected graph. '-1' means 
that V at that position is decreased by one unit, '= 0' means that 
V = at that position. 



tious. Epidemics of these types of diseases can therefore 
roughly be divided into different generations of infected 
individuals |44]. A network consisting of possible edges 
of infections, for an outbreak of this type of disease, 
should therefore have very few odd-length circuits. The 
reason is that the infection is only transmitted between 
succeeding generations, which generates only circuits 
of even length (in the reflexive closure of the network). 
When reconstructing the paths this kind of disease has 
taken in a population, a minimization of the bipartiv- 
ity measures can be a method for excluding redundant 
infectious edges. 

We conclude by an analogy to linear algebra — ^we have 
identified a new dimension (structure) and proposed 
base vectors (measures), that unfortunately are not or- 
thogonal to the other dimensions. 
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APPENDIX A: THE LOWER BOUND OF THE MEASURE 



In this Appendix we argue that, in the N oo limit, 
the lower bound for ^2 is 1/2 (just like bi). First we 
conjecture that the minimal value for b2, just as for bi, 
is attained for complete graphs. (This will be further 
motivated below.) 

To assess b2 for complete graphs, we note that |46] 



E 



odd 3<(<iJ 



2{N - iy. 



N(N-l)(JV-2) 



N{N- 



2 
1) 



(Ala) 



(Alb) 



so n - 3 which results in that v = N — 2 for each e dge. 

Now we apply the marking procedure of Sec. IIIB II 
Marking an edge {u,v) makes v{u,v) = v{v,u) = 0. Fur- 
thermore, every edge (m, w) and {v, w) {w + u, v) will be 
decreased by one since the triangle [u, v, w] now contains 
a marked edge. The discussion will be simplified by con- 
sidering a matrix representation of v{u, v). Marking {u, v) 
sets v(m, v) = v{v, u) = and decreases the M'th and v'th 
columns, and M'th and i7'th rows by one (an example is 
given in FiglUa)). Marking another edge {u', v') {u' and v' 
are different from both u and v, otherwise v{u', v') would 
not be maximal) will have the same effect as marking the 
first. For positions like (u, v') the original v are decreased 
by 2 (see Fig.E^b)), since it has lost the two passing trian- 
gles [u,u' ,v'\. and {v',u,v}. Continuing this process we 
see that it takes N/2 + 0(1) markings for v of each edge 
to be decreased by two units, and thus m' = N^/A + 0{N) 
markings to make v = for all edges. This gives ^2 = 1/2 
in the N ^ co limit. Since the appropriateness of ^2 as a 
bipartivity measure is not really dependent on the limit 
values, we will not give a rigorous proof that the cor- 
rection is of a lower order for all levels of the marking 
procedure (one level is the N/2 + 0(1) edges needed to 
be marked for v to be decreased by at least two units for 
each edge). 

Now we argue that the ^2 takes its minimal value for 
complete graphs. First we note that the number of cir- 
cuits of length n per edge, for any n, is largest in a com- 
plete graph 1 2]. So if we set h arbitrarily and discard 
circuits of length < n in the calculation of v{v), the fully 
connected graph would give the highest m' value and 
thus the lowest bipartivity measure. The strongest can- 
didate for a lower bipartivity measure than that of a 
fully connected graph would thus be a graph such that 
the I1(C„) < 3M and E(C„+2) is as big as possible for 
some n. But the number edges needed to be removed 
from a fully connected graph for E(C„) < 3M to hold, 
not only reduces the contribution to v from circuits of 
length n but also from circuits of length « + 2 to a similar 
extent. If one performs the approximate marking proce- 
dure outlined above for circuits of length five one starts 



0.84 



^0.83 
-Q 



0.82 



10 lO'^ 



t 



10^ 1 0'' 



10*= 



10' 



FIG. 5: The current value of bi (at the lowest-temperature level 
of the cooling) as a function of running time for ten indepen- 
dent measurements of the directed version of the nioki.com 
data. 



from V = (N - 2)(N - 3)(N - 4) and it takes N/2 + 0(1) 
markings to decrease every v with at least 2N^. This 
means that the number of edges needed to be marked to 
make v = for every edge is the same if circuits of length 
five is considered. It also means that a graph as outlined 
above (with E(C„) < 3M and Z(C„+2) is as big as possible) 
probably do not have a lower ^2 than a complete graph. 

To epitomize, the ^2 measure lies rn the interval [1 /2, 1] 
in the N — > oo limit. The finite size corrections to b2 for 
fully connected graphs, however, turns out to make b2 
slightly less than 1/2. 



APPENDIX B: CONVERGENCE OF THE SIMULATED 
ANNEALING 

To analyze the convergence of the simulated anneal- 
ing scheme we run ten independent calculations of 
the b\ qua ntity (with the same parameter values as in 
Sect. lIVBl . The individual time evolutions of bi (at the 
lowest temperature T = 0.002) for the different runs are 
shown in Fig. [S] We note that already after the first 
quench bi is only 3% away from the value at the end of 
the run, and after 50 time steps bi is 0.5% of the value 
after 1 X 10^ time steps. We note that there is no way 
of constructing a statistically valid confidence interval 
for the true bi value since an arbitrary complex energy 
landscape could have a global minimum with a basin of 
attraction of measure zero. There are however indica- 
tions that this is seldom a major problem, at least not for 
the bisection problem Il3l . 

An interesting observation from Fig.^is the step-like 
structure. This is a result of the exchange trials: After 
f ^ 100 the local minimum has been found, but at the 
temperature in question the system is in principle stuck 
in a confined part of the configuration space, and cannot 
enter lower lying energy valleys. In the time scale t = 10^ 
there is another jump in the bi value. This is related to 
that other replicas from other parts of the configuration 
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space reaches the lowest level. At around t = 10^ the 
current highest bi values (lowest energy) reaches another 
plateau. At this time, each replica should have covered 
the whole temperature range several times. This second 
plateau gives two encouraging implications: Firstly, that 
the correct value of bi probably is not very far off the 
measured value. Secondly, that the exchange steps really 



are helpful. If one wants to run this algorithm more 
efficiently the texch we use is far too large (but beneficial 
for separating the time scales in the discussion above). 
Ideally texch should probably be chosen to be of the same 
order as the first jump (from the regular Monte Carlo 
steps) — in the nioki.com network (displayed in Fig. |5J 
this would be f ~ 100. 
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